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PROTEIN LAYERS AND THEIR USE IN ELECTRON MICROSCOPY 



The present invention relates to protein layers which repeat regularly in two dimensions. In 
one aspect, the protein layers are based on symmetrical oligomer assemblies capable of self-assembly 
from the monomers of the oligomer assembly. The layers may have pores with dimensions of the 
5 order of nanometres to hundreds of nanometres. The protein layers are nanostructures which have 
many potential uses, for example as a matrix to support molecular entities for electron microscopy, or 
X-ray crystallography. In another aspect, the invention relates to the use of protein layers for 
performing electron microscopy. 

WO-00/68248 discloses regular protein structures based on symmetrical oligomer assemblies 

10 capable of self-assembly, hi particular, WO-00/68248 discloses structures formed from protein 
protomers (referred to as a "fusion protein" in WO-00/68248) comprising at least two monomers 
(referred to as "oligomerization domains" in WO-00/68248) which are each monomers of a 
respective symmetrical oligomer assembly. Self-assembly of the monomers into the oligomer 
assembly causes assembly of the regular structures themselves. Several different types of structures 

1 5 are disclosed, including discrete structures and structures extending in one, two and three dimensions. 

In WO-00/68248, the relative orientations of the monomers within the protomers are selected 
to provide the desired regular structure upon self-assembly. The monomers are fused together through 
a rigid linking group which is carefully selected to provide the requisite relative orientation of the 
monomers in the protomer. For example, in the laboratory production reported in WO-00/68248, the 

20 selection of the protomer was performed using a computer program to model monomers connected by 
a linking group in the form of a continuous, intervening alpha-helical segment over a range of 
incrementally increased lengths. Thus, for example, the lattices suggested in WO-00/68248 having a 
regular structure repeating in three dimensions are formed from protomers comprising two monomers 
of respective dimeric or trimeric oligomer assemblies which are symmetrical about a single rotational 

25 axis. The relative orientation of the two monomers is selected to provide a specific angle of 

intersection between the rotational symmetry axis of the two oligomer assemblies. Thus, there is a 
single fusion between the two oligomer assemblies and the relative orientation of the oligomer 
assemblies is controlled by careful selection of the linking group providing the fusion. WO-00/68248 
only reports laboratory production of protein structures of a discrete cage and a filament extending in 

30 one dimension. 

It is expected that application of the teaching of WO-00/68248 to protein layers repeating in 
two dimensions would encounter the following difficulties. Firstly, it is expected that there would be 
a difficulty in design arising from the requirement to select the relative orientation of the monomers 
within the protomer appropriate for constructing a layer. This would probably reduce the numbers of 
35 types of oligomer assembly available to form a protein layer, and hence make it difficult to identify 
suitable proteins. Secondly, it is expected that practical difficulties would be encountered during 
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assembly. The structures disclosed in WO-00/68248 rely on the rigidity of the fusion between 
monomers in protomers which forms the single fusion between oligomer assemblies. WO-00/68248 
teaches that the relative orientation of the monomers in the protomers controls the relative orientation 
of the oligomer assemblies in the resultant structure, so it is expected that flexing of the fusion away 
5 from the desired relative orientation would reduce the reliability of self-assembly. It is expected that 
such a problem would become more acute as the size of the repeating unit increases, thereby 
providing a practical restriction on the reliable production of lattices with a relatively large pore 
sizes. 

It would be desirable to provide protein layers having a different type of structure in which 
10 these expected problems might be alleviated. 

According to a first aspect of a present invention, there is provided a protein layer which 
repeats regularly in two dimensions, 

the protein layer comprising protein protomers which each comprise at least two monomers 
genetically fused together, the monomers each being monomers of a respective oligomer assembly, 
15 the protomers comprising: 

a first monomer which is a monomer of a first oligomer assembly belonging to a dihedral 
point group of order O, where O equals 3, 4 or 6, and having a set of O rotational symmetry axes of 
order 2 extending in two dimensions; and 

a second monomer genetically fused to said first monomer which second monomer is a 
20 monomer of a second oligomer assembly having a rotational symmetry axis of order 2, 

the first monomers of the protomers are assembled into said first oligomer assemblies and the 
second monomers of the protomers are assembled into said second oligomer assemblies, said 
rotational symmetry axis of said second oligomer assemblies of order 2 being aligned with one of 
said set of rotational symmetry axes of order 2 of one of said first oligomer assemblies with two 
25 protomers being arranged symmetrically therearound. 

As a result of using a second oligomer assembly having a rotational symmetry axis of the 
same order 2 as the set of O rotational symmetry axes of said first oligomer assembly, the oligomer 
assemblies are fused with those symmetry axes being aligned and with 2 protomers arranged 
symmetrically therearound. This means that there is an 2-fold fusion between the first and second 
30 oligomer assemblies. 

Furthermore the repeating partem of the protein layer is derived from the arrangement of the 
rotational symmetry axes of the first oligomer assembly and is not dependent on the relative 
orientation of the monomers within the protomer. As the first oligomer assembly is dihedral, the set 
of O symmetry axes of order 2 are coplanar. Therefore the protomers assemble into a layer having the 
35 same symmetry as the set of O symmetry axes. 

Therefore, protein layers in accordance with the present invention may be designed by 
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selecting oligomers assemblies with appropriate symmetry to build a layer repeating in two 
dimensions. Protomers are produced comprising monomers of the selected oligomer assemblies fused 
together. Subsequently, the protomers are allowed to self-assemble under suitable conditions. 

To assist in understanding, reference is made to Fig. 1 which illustrates a particular example 
5 of a protein layer 1 in accordance with the present invention. Fig. 1 shows only a part of the protein 
layer 1 which repeats indefinitely in two dimensions. The protein layer 1 assembled from protomers 
2. The protein layer 1 has a comprises a first oligomer assembly 3 which in this example belongs to a 
dihedral point group of order 4 and so has a set of 4 rotational symmetry axes of order 2 (in addition 
to a single rotational symmetry axis of order 4). Each of the monomers 5 of the first oligomer 

10 assembly 3 is fused to a second monomer 6 of a second oligomer assembly 4 which in this example 
belongs to the dihedral point group of order 2, so having a rotational symmetry axis of order 2. As a 
result, the second monomers 6 are assembled into the second oligomer assemblies 4 arranged With 
their rotational symmetry axes of order 2 aligned along the rotational symmetry axes of order 2 of the 
first oligomer assembly 3, and with a 2-fold fusion between the first and second oligomer assemblies 

15 3 and 4. Thus, the symmetry of the protein layer 1 is the same as the symmetry of the set of four 
rotational symmetry axes of order 2, in this case rotational symmetry of order 4. 

Accordingly, the present invention involves the use of a different class of oligomers 
assemblies from that used in WO-00/68248. The present invention provides the benefit that one is not 
restricted by the need to control the relative orientation of the monomers within the protomer. Thus 

20 the design of protein structure is assisted in that the relative orientation of the monomers withing the 
protomer is a less critical constraint. Similarly, more reliable assembly of the protein layer is 
possible, as described in more detail below. 

According to other aspects of the present invention, there is provided an individual protomer 
capable of self-assembly to form such a protein layer, as well as polynucleotides encoding the 

25 protomer, vectors and host cells capable of expressing the protomer and methods of making the 
protomer. 

It has been appreciated that a particularly advantageous use of a protein layer which repeats 
regularly in two dimensions is to perform electron microscopy of a molecular entity. Thus, in 
accordance with a second aspect of the present invention, there is provided a method of performing 
30 electron microscopy of a molecular entity, comprising: 

providing a protein layer having a structure which repeats regularly in two dimensions and 
which supports molecular entities each attached at a predetermined position in the repeating structure 
of the protein layer; and 

performing electron microscopy of the protein layer having the molecular entities supported 
35 thereon to derive an image. 

The method is applicable to any protein layer which repeats regularly in two dimensions, 
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including but not limited to a protein layer in accordance with the first aspect of the present 
invention. 

Thus the protein layer acts as a support for the molecular entities. As the molecular entities 
each at a predetermined position in the repeating structure of the protein layer, the molecular entities 
5 are supported in a regular array. This provides significant advantages in electron microscopy because 
it allows imaging of large numbers of the individual molecular entities in known positions. This 
facilitates various forms of data analysis of the derived image, thereby allowing investigation of the 
structure of the molecular entity. 

The present invention will now be described in more detail by way of non-limitative example 
10 with reference to the accompanying drawings, in which; 

Fig. 1 is a schematic diagram of a protein layer 

Fig. 2 is a schematic diagram of a protein layer which includes a heterologous oligomer 
assembly; 

Fig. 3 is an electron micrograph of a specific protein layer which has been prepared; 
15 Fig. 4 is a schematic diagram of an transmission electron microscope; and 

Fig. 5 is a flowchart of a method of performing electron microscopy. 
Protein layers in accordance with the present invention may be designed by selecting 
oligomer assemblies which, when fused together with rotational symmetry axes of order 2 aligned 
with each other, produce a repeating unit which is capable of repeating in two dimensions. As the 
20 symmetry of the repeating unit, and hence the protein layer as a whole, depends on the symmetry of 
the oligomer assemblies, this involves a selection of oligomer assemblies having a quaternary 
structure which provides appropriate symmetries. This is a straightforward task, because the 
symmetries of oligomer assemblies are generally available in the scientific literature on proteins, for 
example from The Protein Data Bank; H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. 
25 Bhat, H. Weissig, I. N. Shindyalov & P. E. Bourne; Nucleic Acids Research, 28 pp. 235-242 (2000) 
which is the single worldwide archive of structure data of biological macromolecules, also available 
through websites such as http://www.rcsb.org. 

In some cases, the repeating unit repeats in the same orientation across the layer. In other 
cases, two or more adjacent repeating units together form a unit cell which repeats in the same 
30 orientation across the layer, but with the repeating units within a unit cell arranged in different 
orientations. 

Examples of oligomer assemblies which produce structures which repeat regularly in two 
dimensions are given below. 

The first oligomer assembly belongs to a dihedral point group of order O, where O equals 3, 
35 4 or 6 and so has a quaternary structure with rotational symmetry axes extending in two dimensions, 
including a set of O rotational symmetry axes of order 2 which are coplanar, in addition a single 
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rotational symmetry axis of order O which is perpendicular thereto. 

The second oligomer assembly has a quaternary structure with a rotational symmetry axis of 
the same order 2 as the set of O rotational symmetry axes of said first oligomer assembly. For 
example, the second oligomer assembly may belong to a dihedral point group of order 2 or to a cyclic 
5 point group of order 2. The second oligomer assembly does not have a rotational symmetry axis of 
order O. 

In the assembled first oligomer assembly, inevitably and by definition, there are groups of 
first monomers arranged symmetrically around each of the set of O rotational symmetry axes of order 
2 of the first oligomer assembly. This is because the symmetry results from the identical monomers 

10 being so arranged around the rotational symmetry axes. 

As a result of the second monomers fused to the first oligomer assembly being arranged 
symmetrically around one of set of O rotational symmetry axes of order 2 of the first oligomer 
assembly, it follows that the second oligomer assembly is held with the group of fused second 
monomers also held symmetrically around that one of the set of O rotational symmetry axes of order 

15 2 of the first oligomer assembly. 

In addition, inevitably and by definition, the second monomers also assemble in the second 
oligomer assembly in a symmetrical arrangement around the rotational symmetry axis of order 2 of 
the second oligomer assembly. Thus, the result of the second oligomer assembly having a rotational 
symmetry axis of the same order 2 as the set of O rotational symmetry axes of the first oligomer 

20 assembly is that the first and second oligomer assemblies assemble with their symmetry axes of order 
2 aligned with one another. It follows from the symmetry of both oligomer assemblies that this is the 
most stable arrangement. This results in an 2-fold fusion between the first and second oligomer 
assemblies. In each of the first and second oligomer assemblies, there are 2 monomers arranged 
around the rotational symmetry axis, each of the monomers being fused within a respective protomer 

25 to a monomer of the other oligomer assembly. 

As previously mentioned, the set of rotational symmetry axes does not include all the 
rotational symmetry axes of the first oligomer assembly. Rather the set comprises the rotational 
symmetry axes of the first oligomer assembly which are of the same order 2 as rotational symmetry 
axis of the second oligomer assembly. 

30 The particular choice of symmetries of the first and second oligomer assemblies results, on 

assembly of the protomers into the layer, in the oligomer assemblies being built up with their 
rotational symmetry axes aligned. Thus, the relative arrangement of the fused oligomer assemblies 
and hence the protein layer as a whole are therefore derived from arrangements of the rotational 
symmetry axes of the first oligomer assembly and the second oligomer assembly. In particular, the 

35 protein layer has the same symmetry as the set of O rotational symmetry axes of order 2. The 

symmetry of the protein layer is is not dependent on the relative orientation of the monomers within 
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the protomer. In other words, the present invention provides the advantage that the two dimensional 
repeating pattern of the protein layer may be based solely on the arrangements of the rotational 
symmetry axes of the oligomer assemblies. This provides advantages in the design of a protein layer 
by making it easy to select appropriate oligomer assemblies for use in the protein layer. During 
5 design, the relative orientation of the monomers within an individual protomer in its unassembled 
form becomes a much lower constraint than is present in, for example, WO-00/68248. 

There are also advantages during self-assembly of the layer. In particular, the formation of a 
2-fold fusion between two given oligomer assemblies results in the bond between the two oligomer 
assemblies being relatively rigid. This reduces relative motion of the oligomer assemblies during the 
10 assembly process and assists in reliable formation of the layer with the oligomer assemblies in the 
correct relative positions. 

The form and production of the protomers will now be described. Although the present 
invention uses protomers which are different in that they comprise different monomers from WO- 
00/68248, the form and production of the protomers per se, as well as the polynucleotide encoding 
15 the protomers, may be as the same as disclosed in WO-00/68248 which is therefore incorporated 
herein by the reference. 

The nature of the monomers themselves will now be described. 

The monomers are monomers of oligomer assemblies which arc capable of self-assembly 
under suitable conditions to produce a protein layer. The secondary and tertiary structure of the 
20 monomers is unimportant in itself providing they assemble into a quaternary structure with the 
required symmetry. However, it is advantageous if the protein is easily expressed and folded in an 
heterologous expression system (for example using plasmid expression vector in E.Coli). 

The monomers may be naturally occurring proteins, or may be modified by peptide elements 
being absent from, substituted in, or added to a naturally occurring protein provided that the 
25 modifications do not substantially affect the assembly of the monomers into their respective oligomer 
assembly. Such modifications are in themselves known for a number of different purposes which may 
be applied to monomers of the present invention. In other words, the monomer may be a homologue 
and/or fragment and/or fusion protein of a naturally occurring protein. 

The monomer may be chemically modified, e.g. post-translationally modified. For example, 
30 it may be glycosylated or comprise modified amino acid residues. 

Although the monomers may be fused directly together, preferably the monomers are fused 
by a linking group of peptide or non-peptide elements. In general, linking two proteins by a linking 
group is known for other purposes and such linking groups may be applied to the present invention. 
Another factor in the selection of appropriate oligomer assemblies is the location and 
35 orientation of (a) the termini of the first monomers when arranged in the first oligomer assembly in 
its natural form (i.e. not fused to a second oligomer assembly) and (b) the termini of the second 
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monomers when arranged in the second oligomer assembly in its natural form (i.e. not fused to the 
first oligomer assembly). Such information on the arrangement of the termini in the oligomer 
assembly in its natural form is generally available for oligomer assemblies, for example from The 
Protein Data Bank referred to above. Ideally, these termini should have the same separation and 
5 orientation, because they will be fused together in the assembled protein layer to constitute the 2-fold 
fusion arranged symmetrically around a rotational symmetry axis. That being said, it is not essential 
for the separation and orientation to be the same, because any difference may be accommodated by 
deformation of the monomers near the 2-fold fusion and/or by use of a linking group. Therefore, as a 
general point, oligomer assemblies should be chosen in which the termini of both oligomer 

10 assemblies which are to be fused together in an 2-fold fusion allows formation of the fusion without 
preventing assembly of the oligomer assemblies and hence the protein layer. 

Considering the deformation of the monomers near the 2-fold fusion mentioned above, it is 
desirable to minimise such deformation which will tend to reduce the reliability of the assembly 
process. However, if a linking group is fused between the monomers, such deformation may be taken 

15 up, at least partially, by the linking group itself. This reduces the deformation of the monomers, 
thereby increasing the reliability of self-assembly because the linking group does not take part in the 
assembly process as regards to not being part of the naturally occurring protein. There is a particular 
advantage of the use of a linking group. 

Furthermore, the linking group may be specifically designed to be oriented relative to the 

20 first and second monomers in the protomer in its normal form, prior to assembly, to reduce such 
differences in the position and/or orientation of the termini of the first and second monomers. Using 
position and orientation of the termini of the first and second monomers in the first and second 
oligomer assemblies in their natural form which is generally available for oligomer assemblies, as 
discussed above, it is possible to design an appropriate linking group using conventional modelling 

25 techniques. 

Typically, the monomers are fused at their end termini. Alternatively, the monomers may be 
fused at an alternative location in the polypeptide chain so long as the native fold and symmetry of 
the naturally occurring oligomer assembly remains the same. For example, one of the monomers may 
be inserted into a structurally tolerant portion of the other monomer, for example in a loop extending 
30 out of the oligomer assembly. Also, truncation of a monomer is feasible and may be estimated by 
structural examination. 

Some examples of symmetries for the oligomer assemblies to produce a protein layer which 
repeats in two dimensions are as follows. 

In these examples, the first oligomer assembly belongs to a dihedral point group of order O, 
35 where O equals 3, 4 or 6. Hence the first oligomer assembly has a principal rotational symmetry axis 
of order O and also O rotational symmetry axes of order 2 which all extend perpendicular to the 
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principal rotational symmetry axis. In order to develop a layer extending in two dimensions, the 
second oligomer assembly is chosen to have a rotational symmetry axis of order 2 to align with the O 
rotational symmetry axes of order 2 of the first oligomer assembly with a 2-fold fusion between the 
first and second oligomer assemblies. Therefore, in this case, the O rotational symmetry axes of order 
5 2 constitute the set of rotational symmetry axes of the first oligomer assembly, ie N equals O. 

In some classes of protein layer, the protomers are homologous with respect to the 
monomers, ie there is a single type of protomer within the protein layer. In this case, the second 
oligomer assembly may belong to a dihedral point group of order 2. 

For example, Table 1 represents some simple homologous protomers capable of forming a 
10 protein layer. 



Protomer 


M 


N 


Layer Symmetry 


d3d2 


6 


2 


P622 


d4d2 


8 


2 


P422 


d6d2 


12 


2 


P622 



15 Table 1 - Homologous Protomers 

In Table 1, each protomer is identified by letters which represent the oligomer assemblies to 
which the respective monomers of the protomer belong. In particular the letter d represents a dihedral 
■ point group and the following number identifies the order of dihedral point group. In the next two 
columns of Table 1 , there is given the number M of first monomers in the first oligomer assembly and 
20 the order N of the set of rotational symmetry axes of the first oligomer assembly which in this case is 
2. The final column gives the symmetry of the resulting protein layer. In each of these cases, the 
second oligomer assembly belongs to a dihedral point group of order 2. 

Thus it easy to visualise the protein layers. In particular, the first oligomer assembly may be 
visualised as a node from which the set of O rotational symmetry axes of order 2 extend outwardly in 
25 a common plane, perpendicular to the principal rotational symmetry axis of order O. The second 
oligomer assemblies may be visualised as linear links extending from the node aligned with 
respective ones of the set of O rotational symmetry axes of order 2 of the first oligomer assemblies. 
In this way, it is easy to visualise the formation of the layer with pores in the spaces between the 
oligomer assemblies. Thus it will be seen that the symmetry of the layer derives from the symmetrical 
3 0 arrangement of the set of O rotational symmetry axes of order 2 of the first oligomer assemblies. 

In one type of protein layer in which the protomers are homologous with respect to the 
monomers, the second oligomer assembly is a homologous oligomer assembly. In this case the 
protein layer consists solely of the protomers. 

In another type of such a protein layer in which the protomers are homologous with respect to 
35 the monomers, the second oligomer assembly is a hetrologous oligomer assembly of said second 
monomers and of third monomers. In this case, the protein layer consists of the protomers and in 
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addition the third monomers assembled with said second monomers into said second oligomer 
assembly. 

Thus, the protomer by itself cannot assemble into the entire protein layer. The second 
monomers of the heterologous oligomer assembly cannot self-assemble into the entire heterologous 
5 oligomer assembly in the absence of in the absence of the third monomers of that heterologous 
oligomer assembly. This provides advantages during manufacture of the protein layers, because first 
oligomer assemblies may be assembled without assembly of an entire protein layer which might 
otherwise disrupt the production of the protomer. This allows production in a two-stage process. 
A particular heterologous oligmer assembly which may used to advantage as the second 

10 oligomer assembly is one comprising monomers which have a binding site capable of binding to 
biotin or a peptide, and aptamers which are which are capable of binding to said binding site, 
preferably non-covalently. The aptamers are used as the second monomer of the protomer. The 
monomers which have a binding site capable of binding to biotin are a third monomer of the protein 
layer which is not genetically fused within a protomer. On assembly of the second oligomer 

15 assembly, the third monomers assemble to each other and the aptamers assemble into the second 
oligomer assembly by each binding to a respective third monomer. 

This is shown schematically in Fig. 2. which shows an example of a part of the protein lattice 
1 including a single second oligomer assembly 4 of this type, the protein lattice otherwise repeating 
in the same manner as the example shown in Fig. 1. In this example, the first oligomer assembly 

20 3belongs to a dihedral point group of order 4 and so has a set of four rotational symmetry axes of 
order 2. Each of the monomers 5 of the first oligomer assembly 3 is fused to a second monomer 6 
being an aptamer. The protein lattice 1 further comprises third monomers 7 which are assembled 
together as part of the second oligomer assembly 4. The second monomers 6 assemble into the second 
oligomer assembly 4 by each binding to a respective third monomer 7. Thus, in the second oligomer 

25 assembly 4, the second monomers 6 are held with the same symmetry as the third monomers 7, but 
. the second monomers 6 are not assembled to each other. 

This provides advantages in assisting the formation of the protein lattice. The protein lattice 1 
still has a 2-fold fusion between a first oligomer assembly 3 and a second oligomer assembly 4, due 
to both oligomer assemblies 3 and 4 having a symmetry axis of order 2, as discussed above. However 

30 this is achieved without the second monomers 7 themselves needing to assemble to each other. This 
assists the assembly of the first oligomer assembly 4, in contrast the the protein lattice 1 shown in 
Fig. 1 in which both the first and second oligomer assemblies 3 and 4 need to simultaneously 
assemble. Instead the third monomers 7 assemble and the second monomers 6 each individually 
assemble to a respective third monomer 7. 

35 The third monomers typically comprise a binding site. Such a binding site may be capable of 

binding to peptides or non-peptide moieties. In a preferred embodiment the binding site is capable of 
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binding to biotin. In this case, the third monomor may be of any type having such a binding site, for 
example streptavidin, avidin or Neutravidin. 

The terms "streptavidin", "avidin" or "Neutravidin" as used herein cover variants of these 
molecules, unless the context requires otherwise. Such variants are typically homologues of the 
5 original sequences, i.e. are usually homologues of the sequences shown in SEQ ID NO:4 or SEQ ID 
NO:5. The variants may be fragments of the original sequences or of homologues of the original 
sequences. The variant proteins may comprise additional sequences (typically non-streptavidin, 
non-avidin or non-Neutravidin sequence), and thus be fusion proteins which comprise said original 
sequences, homologues or fragments. 

10 Preferably the variant sequences retain the structural properties of the original sequences, 

such as any structural property mentioned herein. Further the variant sequences generally retain the 
ability to bind biotin and/or a peptide (such as the peptide of SEQ ID NO:3). In one embodiment the 
variant sequence is capable of being recognised by an antibody which is capable of recognising the 
original sequence. The variant sequences will of course retain the property of forming a protein layer 

15 as described herein. 

The second monomers which are aptamers capable of binding to the binding site may be any 
of a range of peptide tags, including without limitation streptag I, streptag n, or nanotag. Preferred 
aptamers are peptides which are 7 to 20 amino acids long, for example 9 to 15 amino acids hi length. 
The aptamer may be may have homology with SEQ ID NO. 3, having for example at least 6, 7 or 8 

20 amino acids in common with (i.e. the same as) SEQ ID NO. 3. 

In general the first oligomer assembly 1 may be of any type having the required symmetry. 
One possible example is E. Coli ALAD (delta-aminolevulinic acid dehydrogenase). Other criteria for 
selection of the first oligomer assembly are set out below. 

The aptamer may be fused to a terminus of the first monomer. Where the terminus is used, 

25 the first oligomer assembly should preferably possess a terminus lying close to a symmetry axis of 
order 2 (typically within 1 5 A). 

Alternatively, the aptamer may in general be fused at a position other than the terminus 
provided that the quaternary structure of the first oligomer assembly properties remains substantially 
unaffected and provided that the aptamer is one which does not require to be fused to a terminus. For 

30 example, Streptag I requires a free C-terminus in order to bind streptavidin. Again it is preferable for 
apatamer to be fused at a position within the peptide-sequence of the first monomer resulting in the 
apatamer being located in the assembled oligomer assembly at a position lying close to a symmetry 
axis of order 2 (typically within 15A). 

Optionally, there may be a linking group in the protomer between the first monomer and the 

35 second monomer which is an apatamer. Typically the linking group might be of length in the range 
from 1 to 10 amino acids. The linking group might advantageous provide flexibility which asssists in 
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the assembly of the lattice For example, in the case that the first monomer is E. coli ALAD and and 
the second monomer is streptag I, it is preferred to provide a linking group of length one or two 
amino acids. 

Optionally, additional protein fusions may be genetically fused to any free termini of the first 
5 monomer or third monomer. This might be done to permit functionalisation of the lattice. Specific 
non-limitative examples of suitable additional proteins are hexa-histidine tags, specific affinity 
peptides, ankyrin repeats and calmodulin, each of which have been shown in the literature to be 
capable of genetic fusion to the N-terminus of E. coli ALAD-streptag I without affecting the ability 
of this assembly to self-assemble into lattices. 

10 In other classes of protein layer, the protomers are heterologous with respect to the 

monomers i.e. there are two or more types of protomer in the protein layer. 

To achieve assembly of two types of protomer, the two types of protomer include different 
monomers of the same heterologous oligomer assembly which may belong to a cyclic point group of 
order 2. Thus, the first type of protomer comprises a first monomer which is a monomer of said first 

15 oligomer assembly belonging to a dihedral point group of order O, where O equals 3, 4, or 6, 

genetically fused to a second monomer which is a monomer of the second oligrner assembly, which is 
the hetrologous oligomer assembly belonging to a cyclic point group of order 2. Furthermore, the 
second type of protomer comprises a third monomer which is a monomer of that second oligomer 
assembly. In the second type of protomer, the third monomer is genetically fused to a fourth . 

20 monomer which is a monomer of a third oligomer assembly, the third oligomer assembly belonging to 
a dihedral point group of order 2 or O. 

Thus when the protomers of the different types are allowed to assemble, the heterologous 
oligomer assemblies assemble, thereby linking the protomers of the two types. However, a single type 
of protomer cannot by itself assemble into the entire protein layer. The individual monomers of the 

25 heterologous oligomer assembly cannot self-assemble into the entire heterologous oligomer assembly 
in the absence of the other, different monomers of that heterologous assembly. This provides 
advantages during manufacture of the protein layers, because each type of protomer may be 
separately produced and assembled into a respective, discrete component of the unit cell of the 
repeating pattern, as a result of the monomers of the homologous first oligomer assembly self- 

30 assembling, but without assembly of an entire protein layer. This is an advantage of the heterologous 
protomers, because assembly of the layer may be avoided until the components are brought together. 
Otherwise assembly of the layer might hinder the production of the protomers themselves. This 
allows production in a two-stage process. 

In the simplest types of protein layer, the first oligomer assembly of both types of protomer is 

35 a monomer of a homologous oligomer assembly belonging to a dihedral point group. Thus the 

individual types of protomer may. For example, Table 2 represents some simple heterologous 
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protomers capable of forming a protein layer. 



Protomer 


Components 


1st Protomer 


2nd Protomer 


Layer 
Symmetry 


M 


N 


M 


N 


d3c2A + d3c2A* 


D3/D3 


6 


2 


6 


2 


P622 


d4c2A + d4c2A* 


D4/D4 


8 


2 


8 


2 


P422 


d6c2A + d6c2A* 


D6/D6 


12 


2 


12 


2 


P622 


d3c2A + d2c2A* 


D3/D2 


6 


2 


4 


2 


P622 


d4c2A + d2c2A* 


D4/D2 


8 


2 


4 


2 


P422 


d6c2A + d2c2A* 


D6/D2 


12 


2 


4 


2 


P622 



Table 2 - Heterologous Protomers 



10 In Table 2, the first column identifies the two types of protomer. Each protomer is identified 

by letters which represent the oligomer assemblies to which the respective monomers of the protomer 
belong. Li particular the letter d represents a dihedral point group and the letter c represents a 
monomer of a heterologous oligomer assembly belonging to a cyclic point group. The subscript 
number again represents the order of the point group. The subscript capital letters A and A* are used 

15 to identify the two different monomers of the same heterologous assembly. 

In Table 2, the second column identifies the point groups to which the components resulting 
from the assembly of each type of protomer belongs. A similar notation is used as for the monomers 
of the protomer, except that capital letters are used to indicate that the point group of the component 
is being referred to. Thus capital letter D indicates that the component belongs to a dihedral point 

20 group and the number gives the order of the point group. 

In the next four columns of Table 2, there is given the number M of first monomers in the 
first oligomer assembly and fourth oligomers in the third oligomer assembly, as well as the order N 
(=2) of the set of O rotational symmetry axes of the first oligomer assembly and the third oligomer 
assembly. The final column gives the symmetry of the resulting protein layer. 

25 In all the examples of Table 2, the first oligomer assembly of the first type of protomer 

belongs to a dihedral point group of order O, where O equals 3, 4 or 6. 

In the first three examples of Table 2, the first oligomer assembly of the second type of 
protomer belongs to a dihedral point group of order L, where L equals O. Thus these three examples 
have spatially the same arrangement as the three examples of the corresponding homologous 

30 protomers in Table 1. In the first three examples of Table 2, the first oligomer assemblies of the two 
types of protomer may the same oligomer assembly or may be a different oligomer assembly. 

In the second three examples of Table 2, the first oligomer assembly of the second type of 
protomer belongs to a dihedral point group of order L, where L equals 2. These three examples have 
spatially the same arrangement as the three examples of the corresponding homologous protomers in 

35 Table 1, except as follows. Instead of the two dihedral oligomer assemblies of order O being linked 
by a single cyclic oligomer assembly, the link between the two dihedral oligomer assemblies of order 
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O is extended to be formed by a chain comprising two cyclic oligomer assemblies of order 2 on either 
side of a dihedral oligomer assembly of order 2. Therefore, it will be seen that the repeating unit of 
the heterologous oligomer assembly effectively extends the length of the links of the repeating unit 
between the dihedral oligomer assemblies of order O which may be considered as nodes in the 
5 protein layer. Thus, the size of the pores within the protein layer is also increased relative to the use 
of the corresponding homologous protomers. 

The above examples of protein layers are believed to represent the simplest form of 
protomers capable of forming a protein layer and are preferred for that reason. However, it will be 
appreciated that other protomers formed from monomers of oligomer assemblies having suitable 

10 symmetries will be capable of forming a protein layer. For example, other homologous protomers 
having larger numbers of monomers than listed in Table 1 will be capable of forming a protein layer. 
Similarly, other heterologous protomers will be capable of forming a protein layer. These may 
include two types of protomer having larger numbers of monomers than in the examples of Table 2, 
or may include more than two types of protomer. 

15 For each of the monomers, there is a large choice of oligomer assemblies having the required 

symmetry. The present invention is not limited to particular oligomer assemblies, because in principle 
any oligomer assembly having a quaternary structure with the requisite symmetry may be used. 
However, as examples Table 3 lists some possible choices of oligomer assemblies of various point 
groups including those in Tables 1 and 2. 



Point Group 


Source 


Name of Oligomer Assembly 


PDB Code 


P 3 (T, 32) 


E.coli 


dps 


1DPS 




S.epidermis 


EpiD 


1G63 


P 4 (0, 432) 


Human 


heavy chain ferritin 


2FHA 




E.coli 


Dihydrolipoamide succinyltransferase 


1E20 




A.vinelandii 


Dihydrolipoamide acetyltransferase 


1EAB 


D 2 


Human 


Mn superoxide dismutase 


1AP5 




P.falciparum 


lactate dehydrogenase 


1CEQ 


D 3 


Rat 


6-pyruvoyl tetrahydropterin synthase 


1B66 




E.coli 


Amino acid aminotransferase 


1I1L 


D 4 


E.coli 


PurE 


1QCZ 




Sipunculid worm 


Hemerythrin 


2HMQ 


D 6 


S.typhimurium 


Glutamine Synthetase 


1F1H 


c 2A + c 2A . 


Human 


Casein kinase alpha and beta chains 


1JWH 
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c 3A + c 3A . 


Coliphate T4 


gp5 + gp27 


1K28 




HIV 


N36 + C34 


1ADC 




Pseudomonas putida 


Napthalene 1,2-Dioxygenase 


1NDO 


c 4A +c 4A . 


Erachiopod 


Hemerythrin 


N/A 



Table 3 - Example oligomer assemblies 

Thus the present invention provides a protein protomer or plural protein protomers capable of 
5 assembly into a protein layer. The monomers of the protomer may be of any length but typically have 
a length of 5 to 1000 amino acids, preferably at least 20 amino acids and/or preferably at most 500 
amino acids. 



The invention also provides polynucleotides which encode the protein protomers of the 
invention. The polynucleotide will typically also comprise an additional sequence beyond the 5 
10 and/or 3 ends of the coding sequence. The polynucleotide typically has a length of at least three times 
the length of the encoded protomer. The polynucleotide may be UNA or DNA, including genomic 
DNA, synthetic DNA or cDNA. The polynucleotide may be single or double stranded. 

The polynucleotides may comprise synthetic or modified nucleotides, such as 
methylphosphonate and phosphorothioate backbones or the addition of acridine or polylysine chains 
15 at the 3' and/or 5' ends of the molecule. 

Such polynucleotides may be produced and used using standard techniques. For example, the 
comments made in WO-00/68248 about nucleic acids and their uses apply equally to the 
polynucleotides of the present invention. 

The monomers are typically combined to form protomers by fusion of the respective genes at 
20 the genetic level (e.g. by removing the stop codon of the 5' gene and allowing an in-frame read 
through to the 3' gene). In this case the recombinant gene is expressed as a single polypeptide. The 
genes may, alternatively, be fused at a position other than the end terminus so long as the quaternary 
structure of the oligomer assembly properties remains substantially unaffected. In particular, one 
gene may be inserted within a structurally tolerant region of a second gene to produce an in-frame 
25 fusion. 

The invention also provides expression vectors which comprise polynucleotides of the 
invention and which are capable of expressing a protein protomer of the invention. Such vectors may 
also comprise appropriate initiators, promoters, enhancers and other elements, such as for example 
polyadenylation signals which may be necessary, and which are positioned in the correct orientation, 
30 in order to allow for protein expression. 

Thus the coding sequence in the vector is operably linked to such elements so that they 
provide for expression of the coding sequence (typically in a cell). The term "operably linked" refers 
to a juxtaposition wherein the components described are in a relationship permitting them to function 



WO 2008/145951 



-15- 



PCT/GB2008/001437 



in their intended manner. 

The vector may be for example, plasmid, virus or phage vector. Typically the vector has an 
origin of replication. The vector may comprise one or more selectable marker genes, for example an 
ampicillin resistance gene in the case of a bacterial plasmid or a resistance gene for a fungal vector. 
5 Promoters and other expression regulation signals may be selected to be compatible with the 

host cell for which expression is designed. For example, yeast promoters include S. cerevisiae GAL4 
and ADH promoters, S. pombe nmt\ and adh promoter. Mammalian promoters include the 
metallothionein promoter which can be induced in response to heavy metals such as cadmium. Viral 
promoters such as the SV40 large T antigen promoter or adenovirus promoters may also be used. 

10 Mammalian promoters, such as b-actin promoters, may be used. Tissue-specific promoters 

are especially preferred. Viral promoters may also be used, for example the Moloney murine 
leukaemia virus long terminal repeat (MMLV LTR), the rous sarcoma virus (RSV) LTR promoter, 
the SV40 promoter, the human cytomegalovirus (CMV) IE promoter, adenovirus, HSV promoters 
(such as the HSV IE promoters), or HPV promoters, particularly the HPV upstream regulatory region 

15 (URR). 

Another method that can be used for the expression of the protein protomers is cell-free 
expression, for example bacterial, yeast or mammalian. 

The invention also includes cells that have been modified to express the protomers of the 
invention. Such cells include transient, or preferably stable higher eukaryotic cell lines, such as 

20 mammalian cells or insect cells, using for example a baculovirus expression system, lower eukaryotic 
cells, such as yeast or prokaryotic cells such as bacterial cells. Particular examples of cells which may 
be modified by insertion of vectors encoding for a polypeptide according to the invention include 
mammalian HEK293T, CHO, HeLa and COS cells. Preferably the cell line selected will be one 
which is not only stable, but also allows for mature glycosylation of a polypeptide. Expression may 

25 be achieved in transformed oocytes. 

The protein protomers, polynucleotides, vectors or cells of the invention may be present in a 
substantially isolated form. They may also be in a substantially purified form, in which case they will 
generally comprise at least 90%, e.g. at least 95%, 98% or 99%, of the proteins, polynucleotides, 
cells or dry mass of the preparation. 

30 The protomers may be prepared using the vectors and host cells using standard techniques. 

For example, the comments made in WO-00/68248 regarding methods of preparing protomers 
(referred to as "fusion proteins" in WO-00/68248) apply equally to preparation of protomers 
according to the present invention. 

Assembly of the protein layer from the protomers may be performed simply by placing the 

35 protomers under suitable conditions for self-assembly of the monomers of the oligomer assemblies. 
Typically, this will be performed by placing the protomers in solution, preferably an aqueous 
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solution. Typically, the suitable conditions will correspond to those in which the naturally occurring 
protein self-assembles in nature. Suitable conditions may be those specifically disclosed in WO- 
00/68248. 

In the case of homologous protomers this results in direct assembly of the protein-layer. 
5 In the case of heterologous protomers, assembly is preferably performed in plural stages. In a 

first stage, each type of protomer is separately assembled into a respective discrete component. In a 
second stage, the discrete components are brought together and assembled into the protein layer. 
Where plural heterologous protomers are used, there may be further stages intermediate the first and 
second stage in which the respective discrete components are brought together and assembled into 
10 larger, intermediate components. 

There will now be described a method by which there has been prepared a specific protein 
layer which is an example of the type shown in Fig. 2 

The protomers consisted of a first monomer being E. coli ALAD and a second monomer 
being steptag I. The third monomer was streptavidin. 
1 5 The protomers were prepared in an E. coli plasmid vector using standard techniques. The 

E.coli plasmid vector was a derivative of pUC19 having the sequence SEQ ID NO. 1. The sequence 
of the protomer is; 

MTMGSMTDLIQRPRRLRKSPALRAMra 

HLAREffiRL\NAGIRSVMTFGISHHTO 
20 EYTSHGHCGVLCEHGVDNDATLENLGKQAWAAAAGAXFIAPSAAMDGQVQAIRQALDAA 

GFBd^TAMSYSTKFASSFYGPFREAAGSALKGDRKSYQMNPMNRREAIRESLLDEAQGANCL 

MVKPAGAYLDIV^LP^RTELPIGAYQVSGEYAMIKFAALAGAIDEEKVVLESLGSIKRAGA 

DLEFSYFALDLAEKKILRRSAWRHPQFGG (SEQ ID NO. 2) 
The sequence of Streptag I is: 
25 AWRHPQFGG (SEQ ID NO. 3) 

The sequence of streptavidin (as used in the work described herein) is: 

MET GLU ALA GLY ILE THR GLY THR TRP TYR ASN GLN LEU 

GLY SER THR PHE ILE VAL THR ALA GLY ALA ASP GLY ALA 

LEU THR GLY THR TYR GLU SER ALA VAL GLY ASN ALA GLU 
30 SER ARG TYR VAL LEU THR GLY ARG TYR ASP SER ALA PRO 

ALA THR ASP GLY SER GLY THR ALA LEU GLY TRP THR VAL 

ALA TRP LYS ASN ASN TYR ARG ASN ALA HIS SER ALA THR 

THR TRP SER GLY GLN TYR VAL GLY GLY ALA GLU ALA ARG 

ILE ASN THR GLN TRP LEU LEU THR SER GLY THR THR GLU 
35 ALA ASN ALA TRP LYS SER THR LEU VAL GLY HIS ASP THR 

PHE THR LYS VAL LYS PRO SER ALA ALA SER (SEQ ID NO:4). 
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For reference the sequence of avidin is: 
ARKCSLTGKW TNDLGSNMTI GAVNSRGEFT GTYTTAVTAT SNEDCESPLH GTQNTENKRT 
QPTFGFTVNW KFSESTTVFT GQCFIDRNGK EVLKTMWLLR SSVNDIGDDW KATRVGEMIF 
TRLRTQKE (SEQ ID NO:5). 
5 The gene encoding 5-Aminolaevulinic acid dehydratase (ALAD) was amplified from 

DH5alpha genomic DNA and inserted into the DsRed-Express-streptagI expression vector described 
above to replace the DsRed-Express gene cassette. 

An ALAD-streptagI protomer was then prepared. 0.1 mM IPTG Was included in the 
expression medium. Induction of expression was as follows: a 10ml overnight culture of the 

10 expression strain (in LB broth containing 30ug/ml Kanamycin) was diluted 1 : 100 into fresh LB broth 
containing 30ug/ml Kanamycin, Cells were grown with shaking at 37"C to a density corresponding to 
an OD 600 of 0.6 and were then induced to express the target protein by the addition of IPTG to a final 
concentration of ImM. The culture was maintained at 37° C with shaking for a further 3 hours before 
the cells were harvested by centrifugation (5000g, lOmin, 4°C). The cell pellet was resuspended in 

15 20ml of buffer A (300mM NaCl, ImM EDTA, 50mM HEPES, pH7.5). Cells were lysed by 
sonication and the insoluble fraction harvested by centrifugation (25,000g, 30 min, 4°C). This 
fraction was dissolved in 8M urea and centrifuged (25,000g, 30 min, 4'C) to remove insoluble 
particles. The urea solubilised material was concentrated to 16mg/ml and passed through a 0.22um 
filter. A drop of this material (lul) was then directly injected into a larger drop (5ul) of buffer A. 

20 In general many expression and purification options are available. Another repeatedly 

successful protocol is as follows: 

1 . A single colony of BL2 1 (DE3)Star E.coli was transferred from an Luria-Bertani Agar plate 
to 500ml of Luria-Bertani medium containing 75ug/ml ampicillin and O.lmM isopropylthio-beta-D- 
galactopyranoside (IPTG). 
25 2. This culture was incubated with shaking for 1 8hrs at 37°C. 

3. The culture was harvested by centrifugation (5,000g, 5min) and resuspended in 10ml of 
buffer "GF" (150mM NaCl, 50mM Tris-HCl, ImM EDTA, 0.02% sodium azide, pH8.0). 

4. Cells were lysed using either sonication, freeze thaw, cells lysis reagents (e.g. "Bugbuster"), 
or lysozyme and DNAse treatment. These are techniques standard in the art. 

30 5. The insoluble fraction was removed by centrifugation (30,000g, 30min). 

6. The fusion protein was purified from the soluble fraction using Strep-tactin sepharose (IBA 
GmbH) according to the manufacturers instructions. 

7. Eluted protein was separated from the desthiobiotin contamination that results from the 
Strep-tactin column by mean of size exclusion chromatography using a superose 6 matrix and buffer 

35 GF. 

8. Purified protein could be stored at 4°C for at least 6 months. 
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The purified ALAD-streptag I protomer (~lmg/ml) was mixed with commercially available 
core streptavidin in equimolar amounts. Self-assembly commenced immediately and the resultant 
protein lattices were visualised by means of transmission electron microscopy. Fig. 3 shows a 
negatively stained transmission electron micrograph of the protein lattice, the unit cell size being 
5 13nm x 13nm. Image processing of the electron micrographs was performed to enhance the image 
quality. In particular the electron micrograph was Fourier transformed, filtered using a space group 
derived filter and averaging, and then reconstructed. 

Protein layers in accordance with the present invention have numerous different uses. In 
general, such uses will take advantage of the regular repeating structure and/or the pores which are 

10 present within the structure. Layers in accordance with the present invention may be designed to have 
pores with dimensions expected to be of the order of nanometres to hundreds of nanometres. Layers 
may be designed with an appropriate pore size for a desired use. 

The highly defined, unusually sized and finely controlled pore sizes of the protein lattices or 
layers together with the stability of their structures make them ideal for applications requiring 

15 microporous materials with pore sizes in the range just mentioned. As one example, the lattices or 
layers are expected to be useful as a filter element or molecular sieve for filtration or separation 
processes. In this use, the pore sizes achievable and the ability to design the size of a pore are 
particularly advantageous. 

In another class of use, molecular entities would be attached to the protein layer. Such 

20 attachment may be done using conventional techniques. The molecular entities may be any entities of 
an appropriate size, typically a macromolecular entity, for example proteins, polynucleotides, such as 
DNA, or non-biological entities. The molecular entities may be a single molecule or a complex of 
plural molecules. As such, the protein layers are expected to be useful as biological matrices for 
carrying molecular entities, for example for use in drug delivery, or for crystallizing molecular 

25 entities. 

Attachment of the molecular entities to the protein layer may be performed in a number of 

ways. 

• Some approaches involve "tagging" either or both of the protein protomers (or other 
component of the layer) or the molecular entities of interest. In this context, tagging is the covalent 

30 addition to either or both of the protein protomers (or other component of the layer) or to the target 
molecular entities, of a structure known as a tag or affinity tag which forms strong interactions with a 
target structure. Typically, short peptide motifs (e.g. heterodimeric coiled coils such as the "Velcro" 
acid and base peptides) are used for this purpose. In the case of the protein protomer (or other 
component of the layer), or a molecular entities which is a protein, this may be achieved by 

35 genetically fusing the tag to a component of the protein layer or the molecular entity, that is the 

expression of a genetically modified version of the protein to carry an additional sequence of peptide 
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elements which constitute the tag, for example at one of its termini, or in a loop region. Alternative 
methods of adding a tag include covalent modification of a protein after it has been expressed, 
through techniques such as intein technology. 

In one approach, the target structure may be a further tag attached to the other of the protein 
5 protomer or target molecular entity, ie both of a component of the layer and target molecular entity 
include complementary affinity tags for attachment to each other. 

In another approach, the target structure may be a part of the protein protomer (or other 
component of the layer) or target molecular entity, ie one of a component of the layer and target 
molecular entity has an affinity tags which has an affinity to the other of a component of the layer and 
1 0 target molecular entity. Thus, to attach the molecular entity to the protein layer, a component of the 
layer may include, at a predetermined position in the protomers, an affinity tag attached to the 
molecular entity of interest. Alternatively, the molecular entity of interest may have at a 
predetermined position in the molecular entity, an affinity tag attached to a component of the layer . 
When a component of the protein layer is known to form strong interactions with a known 
15 peptide sequence, that peptide sequence may be used as a tag to be added to the target molecular 
. entity. Where no such tight binding partner is known, suitable tags may be identified by means of 
screening. The types of screening possible are phage-display techniques, or redundant chemical 
library approaches to produce a large number of different short (for example 3-50 amino acid) 
peptides. The tightest binding peptide elements may be identified using standard techniques, for 
20 example amplification and sequencing in the case of phage-displayed libraries or by means of peptide 
sequencing in the case of redundant libraries. 

An alternative approach is for the target molecular entity itself to be expressed as a direct 
genetic fusion to a component of the layer. 

Another alternative approach is to make specific chemical modifications of the lattice in 
25 order to provide alternative affinity-based or covalent means of attachment. For example, the site- 
specific derivitization of accessible sulphydryl groups in the lattice may be used for the incorporation 
of nitrilo-triacetic acid (NTA) groups which in turn may be used for binding of metal ions and hence 
histidine rich target proteins. 

To attach the molecular entity to the protein layer using an affinity tag on the layer or the 
30 molecular entity, the molecular entity may be allowed to diffuse into, and hence become attached to, 
a pre-formed protein layer, for example by annealing of the bound molecular entity into their lowest 
energy configurations in the protein layer may be performed using controlled cooling in a liquid 
nitrogen cryostream. Alternatively, the molecular entities may be mixed with the protomers during 
formation of the protein layer to assemble with the layer. 
35 In another class of uses, proteins having useful properties could be incorporated as one of the 

protomers. 
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A use in which an entity is attached to the protein layer is to perform X-ray crystallography 
of the molecular entities. In this case, the regular structure of the protein layer allows the molecular 
entities to be held at a predetermined position relative to a repeating structure, so that they are held in 
a regular array and in a regular orientation. X-ray crystallography is important in biochemical 
5 research and rational drug design. 

The protein layer having an array of molecular entities supported thereof may be studied 
using standard x-ray crystallographic techniques. Use of the protein layer as a support in x-ray 
crystallography is expected to provide numerous and significant advantages over current technology 
and protocol for X-ray crystallography, including the following: 
10 (1) Significantly lower amounts of molecule will be required (probably of order micrograms 
rather than milligrams). This will allow determination of some previously intractable targets. 

(2) Use of affinity tags will allow structure determination without the typical requirement for a 
number of purification steps. 

(3) There will be no need to crystallize the molecular entity. This is a difficult and occasionally 
1 5 insurmountable step in traditional X-ray structure determination. 

(4) There will be no need to obtain crystalline derivatives for each novel crystal structure to 
obtain the required phase information. Since the majority of scattering matter will be the known 
protein layer in each case, determination of the structure may be automated and achieved rapidly by a 
computer user with little or no crystallographic expertise. 

20 (5) The complexes of a protein with chemicals (substrates/drugs) and with other proteins can be 
examined without requiring entirely new crystallization conditions. 

(6) The process is expected to be extremely rapid and universally applicable, which will provide 
enormous savings in time and costs. 

For use in catalysing biotransformations, enzymes may be attached to the protein layer, or 
25 incorporated in the protein layer. 

For use in data storage, it may be possible to attach a protein which is optically or 
electronically active. One example is Bacteriorhodopsin, but many other proteins can be used in this 
capacity. In this case, the protein layer holds the attached protein in a highly ordered array, thereby 
allowing the array to be addressed. The protein layer might overcome the size limitations of existing 
30 matrices for holding proteins for use in data storage. 

For use in a display, it may be possible to attach a protein which is photoactive or 
fluorescent. In this case, the protein layer holds the attached protein in a highly ordered array, thereby 
allowing the array to be addressed for displaying an image. 

For use in charge separation, a protein which is capable of carrying out a charge separation 
35 process may be attached to the protein layer, or incorporated in the protein layer. Then the protein 
may be induced to carry out the separation, for example biochemically by a "fuel" such as ATP or 
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optically in the case of a photoactive centre such as chlorophyll or a photoactive protein such as 
rhodopsin. A variety of charge separation processes might be performed in this way, for example ion 
pumping or development of a photo-voltaic charge. 

For use as a nanowire, a protein which is capable of electrical conduction may be attached to 
5 the protein layer, or incorporated in the protein layer. Using an anisotropic protein layer, it might be 
able to provide the capability of carrying current in a particular direction. 

For use as a motor, proteins which are capable of induced expansion/contraction may be 
incorporated into the protein layer. 

The protein lattices may be used as a mould. For example, silicon could be diffused or 
10 otherwise impregnated into the pores of the protein lattice, thus either partially or completely filling 
the lattice interstices. The protein material comprising the original lattice may, if required, then be 
removed, for example, through the use of a hydrolysing solution. 

Another use in which an entity is attached to the protein layer is to perform electron 
microscopy of the molecular entities. This may be performed to detennine the structure of the 
1 5 entities. The entities may be of any type including a macromolecule (e.g. a protein or DNA) or a 
macrdmolecular complex (e.g. a complex of a macromolecule with one or more other molecular 
species). 

There will first be described known electron microscopy techniques by way of background. 
Fig. 4 schematically shows a transmission electron microscope 10 arranged as follows. An 
20 electron source 1 1 produces electrons. An objective lens system 12 directs a beam of electrons from 
the source 1 1 onto a sample 13. An imaging lens system 14 directs electrons transmitted through the 
sample 13 onto a sensor 15 which produces an image. The image may be a focussed image or may be 
a diffraction pattern, the latter being useful where the entity is presented in a regular array (e.g. tubes 
of molecules, 2D crystals, or helical arrays). Information from multiple images, corresponding to 
25 multiple different views of the molecular species, may be subsequently combined to produce a 3D 
reconstruction. 

Sample preparation and presentation within the microscope is performed as follows. 
In practise, samples 13 are presented to the electron beam within the sample holder of an 
electron microscope. Samples 13 are generally mounted on a copper grid. This may have been 
30 coated with a thin layer of deposited carbon that may in turn be either continuous across the holes of 
the grid, or may be deliberately incomplete so as to leave holes in which the sample floats ( a "lacey" 
carbon layer). 

Details of the sample mounting protocol depend on whether or not the sample is to be 
visualised under cryo-conditions. 
35 For cryo conditions, the sample 1 3 may be introduced into a medium that is augmented with 

a cryoprotectant agent so as to minimise the tendency to form ice at low temperatures. Examples of 
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cryoprotectant agents include glucose and trehalose. In addition, a contrast-enhancing constituent 
may be added to the sample 13 environment. An example of a contrast enhancing agent is tannin. 
After cryoprotection, the sample 13 is introduced onto the (possibly coated) copper grid, excess 
sample and embedding medium are withdrawn by blotting so as to produce a sample 13 no thicker 
5 than 1000 A, and the grid is introduced into an environment at cryo-temperatures (< 200K) . The 
speed of cooling is an important factor in avoiding the formation of ice and consequent sample 
damage during freezing. Rapid cooling may be achieved by plunging the sample 13 into liquid 
nitrogen, into a stream of gaseous nitrogen at temperatures below 120K, or into a bath of a less 
volatile liquid (such as propane) at cryo temperatures. Mechanical stages may be used to ensure a 

10 rapid and reproducible introduction of the copper grid into the freezing environment. 

Where samples 13 are not to be presented in vitreous frozen solution (i.e. under non-cryo 
conditions), a solution of the substance to be imaged is introduced onto a carbon-coated copper grid, 
a period of time is left for sample to adsorb to the carbon layer, and then excess sample and solution 
are withdrawn by blotting. To enhance the contrast of images, and to minimise the deleterious 

15 consequences of radiation damage, the sample 13 may then be stained. Since biological samples 
demonstrate intrinsically low scattering, the stains used are generally themselves electron dense, and 
hence strongly scattering. Thus the stains used are generally "negative stains": the images recorded 
are dark where the stain is, and are lighter in regions from which stain is excluded by the presence of 
the sample. Uranyl acetate is an example of a negative stain. 

20 Data collection is performed as follows. 

In the case of deriving a focussed image, images are in fact recorded away from perfect 
focus. While this is done to generate contrast in the image, it results in a degradation of the image. 
Specifically, Fourier terms calculated from the image are modulated by a "Contrast Transfer 
Function" (CTF) , which modulates the amplitudes of Fourier terms in a manner that is a function of 

25 the corresponding scattering angle. Corrected Fourier terms can generally be recovered by 

appropriate scaling once the extent of defocus and astigmatism have been characterised. At a given 
defocus, the CTF will adopt a value of zero for Fourier terms corresponding to particular scattering 
angles. These terms cannot, therefore, be recovered by post processing. To fill in the corresponding 
holes in reciprocal space, images are recorded at a range of defocuses, so that Fourier terms that are 

30 modulated to (or close to) zero in an image recorded at one defocus will have a measurable amplitude 
at another defocus. 

Inelastic interaction of electrons with the sample results in deposition of energy that, in turn, 
causes damage to the sample. This damage degrades the structure of the molecules within the 
sample. For this reason, images and diffraction patterns are recorded using a relatively low dose of 
35 electrons. This experimental limitation means that there is a relatively poor signal to noise ratio in 
the recorded images of each molecular species captured within the field of view of an image. This 
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translates to each image carrying relatively low resolution information about the structure of sample. 
In general, enhancement of the signal to noise is achieved by effectively averaging the images of 
multiple molecules that are observed in the same (or similar) orientations with respect to the electron 
beam. 

5 Each image approximates to a projection of the electron density (or more precisely the 

potential) distribution of the molecular species. Hence, a single image of a single molecule does not 
contain sufficient information to infer the 3D structure of that molecule. Therefore, images have to 
be recorded from the sample in multiple orientations with respect to the beam. 

For periodic structures, Fourier components can be measured directly by recording the 

10 diffraction pattern, rather than an image of the sample. This approach avoids the complication of 
modulation by the CTF although other characteristics of the experiment and of the instrument must 
still be corrected for in post-processing. For periodic samples (e.g. 2D crystals or helical arrays), 
scattering becomes concentrated into discrete directions that are characteristic of the size and shape 
of the repeated unit (i.e. the unit cell), giving rise to diffraction spots in the scattering pattern, rather 

1 5 than a continuous scattering function. This process of "Bragg amplification" makes for readily 
recordable signals. A further advantage of recording the diffraction pattern is that the intensities of 
the scattered partem (i.e. that property which is recorded) are independent of global motions of the 
sample during the exposure. Such motions can be caused by thermal fluctuation as well as specific 
heating and charging of the sample caused by the electron beam. A disadvantage of recording the 

20 scattered pattern rather than focussed electrons (i.e. a diffraction pattern rather than an image) is that 
recording of the scattered partem loses phase information for the Fourier terms. At the same time, 
local imperfections in a can be corrected if an image thereof is collected, but not (trivially) if a 
diffraction pattern is collected. 

In the case of electron tomography, a single example of the species to be visualised is imaged 

25 with extremely low dose at a range of orientations. Hence a single molecular species is imaged. This 
addresses a potential criticism of other approaches: each representative molecule of a sample might 
be subtly different, which makes both the averaging of multiple images and 3D reconstruction 
inappropriate. It has the disadvantage that the electron dose that can be tolerated by a single species 
is spread over imaging in multiple orientations: this ultimately limits the resolution of 3D 

30 reconstruction that can be achieved. 

Data analysis is performed as follows. 

The protocol used to analyse data from electron microscopy depends primarily on whether 
the sample is periodic (i.e. 2D crystalline or presented in a helical array), or aperiodic, i.e. presented 
as isolated particles which may or may not have local rotational symmetry, but which lack significant 
35 translational symmetry. In both cases, where image (raher than diffraction data) have been collected, 
the defocus and astigmatism of the sample are identified by analysis of the intensity distribution of 
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Fourier transformed regions of the image. Based on these values, which may vary across the image, 
an appropriate correction can be calculated to compensate for CTF effects. 

One type of data analysis is single particle reconstruction. This allows reconstruction of a 
three-dimensional (3D) image from images of individual entities, as follows. 
5 For non-periodic samples, images (rather than diffraction patterns) are recorded. Analysis 

begins with locating samples on the recorded image. For unstained biological molecules this presents 
a significant problem: the inherently low signal-to-noise ratio means that molecules may not be 
apparent against background. Even if they are visible such molecules may be so poorly imaged as to 
preclude the characterisation of their orientation compared to other images of the same molecular 

10 species. This problem is made worse where the species to be visualised is small. In practise, it is not 
readily possible to apply conventional EM to non-crystalline samples of macromolecules (or 
macromolecular complexes) with a combined molecular weight less that ~ 125 kDa. 

After locating multiple molecular species to assemble a "dataset" of (noisy) images, the next 
stage is classification. In this step, images of the molecular species are grouped, so that those that 

1 5 represent similar views are associated with each other. Particularly where a carbon support has been 
used, there may be a limited set of such views present in the dataset. Images of particles that fall 
within such clusters are averaged to provide "class averages". The relative orientations of a set of 
class averages is determined by means of a "common lines" or similar approach. Ultimately, this 
allows the information from multiple different views to be assembled in reciprocal space so as to 

20 permit 3D reconstruction. 

Another type of data analysis is two-dimensional (2D) crystallographic analysis. This is 
applicable to periodic samples. Data may have been collected as images or as diffraction patterns. 

In images of a crystalline lattice, recognition of the geometry and location of the lattice 
provides a readily exploited means of predicting the location of the mufiiple copies of the species to 

25 be imaged. Averaging can be performed either in real space (where individual unit cell images are 
summed) or in reciprocal space. In the latter approach, the image is Fourier transformed to produce a 
set of diffraction spots that result from scattering by that part of the image which has a periodic 
character, i.e. by the ordered array of molecules. The rest of the scattering (i.e. that intensity which 
does not fall at the position of diffraction spots) comes from background and noise. The multiple unit 

30 cells in the field of view can therefore be averaged by setting all off-peak intensities to zero and 
carrying out a further Fourier transformation. This process is called Fourier filtering. Both real 
space averaged and Fourier Filtered images can be enhanced by a process of "unbending" In this 
process, local distortions of the lattice can be identified (generally by an autocorrelation method), 
and used to correct the image to generate a picture that would prevail if the lattice were not subject to 

35 any local distortion. 

Diffraction patterns of the crystalline lattice can be used to measure directly the amplitudes 
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of Fourier components. Phases for these terms can be established only using methods analogous to 
those used in protein crystallography. These include isomorphous replacement (IR) , molecular 
replacement (MR) , and density modification (DM). For IR, diffraction has to be measured before 
and after the addition (or substitution) of a part of the structure. For MR, a known structure or 
5 electron density distribution can be used to calculate phases for the unknown structure. For DM, 
phases for low resolution terms must be available (e.g. from analysis of images as above), and 
phasing of high resolution terms is achieved by iterative imposition of averaging and solvent 
flattening, including increasingly high resolution terms into the process as phase is extended. 

The disposition of the crystal with respect to the beam can be inferred from the apparent 
10 geometry of diffraction spots, which may either be recorded directly or calculated by Fourier 

transformation of an image, provided that the geometry of the repeating unit in the crystal (the unit 
cell geometry) is known. Where the structure of a significant part of the lattice is known, a calculated 
image of this part of the lattice can also be used to assess the orientation of the lattice in an 
experimentally recorded image. Thus information from multiple images in multiple orientations, 
1 5 collected at multiple tilt angles can readily be combined to carry out 3D image reconstruction. 

The application to imaging of molecular entities supported on a protein lattice will now be 
described. Benefits are achieved because the entities are each supported at a predetermined position 
in the repeating structure of the protein layer. 

There may be used a conventional transmission electron microscope 10, for example as 
20 shown in Fig. 4. Imaging is performed using the method shown in Fig. 5. 

First in step SI, there is prepared a protein lattice having the molecular entities attached 
thereto. This is done using the techniques described above. A sample 13 for the transmission electron 
microscope 10 is prepared with the protein lattice using standard procedures, as discussed above. 

Two approaches for attaching the entity to the protein layer are as follows. 
25 In the first approach, the entity is added to a solution (or suspension) containing the protein 

layer. Thus the entities attach to the layer in solution. The resultant layer is then subjected to sample 
preparation as described above for either cryo electron microscopy or for non-cryo electron 
microscopy, either with or without staining. 

In the second approach, the protein layer is first deposited onto the carbon layer of a coated 
30 copper grid to form the sample holder of the electron microscope 10. The entity is introduced 
subsequently. In this case, a suspension of the protein lattice is placed on the carbon-coated grid, 
adsorption is allowed to occur, excess crysalin and surrounding solution are removed, and a solution 
of the target species is introduced. After an incubation in which binding of the target to the crysalin 
occurs, excess target and surrounding solution is removed. Subsequent sample preparation is as 
35 described above for either cryo electron microscopy or for non-cryo electron microscopy 

For optimal resolution in the structure of the molecular entity, it is preferable for the 
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molecular entities to be aligned with identical orientations with respect to every axis. In step S2 
which is optional, the molecular entities are aligned with respect to the protein lattice. 

Two possible methods of molecular alignment which may be implemented, either 
independently or in combination, are as follows. 
5 A first alignment method is to apply an electric field with a vector parallel to the principal 

symmetry axis of the "first" protein layer component in order to align the molecular entities by virtue 
of their intrinsic or induced dipoles. 

A second alignment method takes advantage of polar and/or hydrophobic interactions 
between molecular entities and the protein layer through a process of thermal annealing during which 
10 the target molecules are slowly cooled to identical minimum energy conformations. 

In step S3, imaging is perfomed to derive an image. Such data collection is conducted using 
standard protocols, for example as described above for conventional EM. By way of example images 
may be collected at a series of defocus steps and also employing the tilt-stage of the microscope to 
image the lattice through a range of angles. Where orientation of the target molecules has been 
1 5 successful, a series of electron diffraction images may also be usefully collected. 

In step S4, data analysis of the images is performed. A variety of data analysis techniques 
may be applied, as follows. 

Where it has been possible to impose an approximately common orientation of each bound 
target molecule with respect to the underlying lattice, a 2D crystallographic data analysis may be 
20 performed, as described above. This allows a 3D reconstruction of the target molecule to be derived. 
Single particle image reconstruction tools can also theoretically be applied to image 
reconstruction of 2D periodic arrays, and where this provides improved image reconstruction, that 
approach is also taken to image protein layers and attached molecular entities. Hybrid methods, 
whereby some computational techniques of 2D crystallography are combined with computational 
25 techniques of single particle image analysis, are also used where this is suitable. 

Where it has not been possible to impose an approximately common orientation of each 
bound target molecule, a combination of the methods outlined above for single particle 3D 
reconstruction and 2D crystallography are applied. In this combination, the components of the 
protein lattice itself are identified and subtracted from the image. 
30 The components of the protein lattice may be derived as described above from an analysis of 

one or more recorded images of a protein layer and attached molecular entities. Alternatively, the 
components of the protein lattice may be derived from a reference image acquired separately or being 
a stored image acquired previously. 

This allows the lattice components of each image are identified to be removed. The resulting 
35 difference image is an image of the entities in isolation that would have been recorded if the entities 
were disposed in space at positions having the same repeating pattern as the structure of the protein 
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layer, albeit in a partially random orientation. 

Thereafter single particle reconstruction is performed, as described above. This process is 
expedited by the fact that the protein layer will be found at readily predicted positions on the image, 
as a consequence of their binding to known locations on the protein layer, the location and orientation 
5 of which is readily identified. The subtraction of the reference image effectively accomplishes the 
first step of single particle 3D reconstruction (particle picking) as described above. Similarly, a 
degree of alignment of the molecules is likely to apply and contributes to particle classification. 
Variants 

Homologues of protein sequences are referred to herein. Such homologues typically have at 

10 least 70% homology, preferably at least 80, 90%, 95%, 97% or 99% homology, for example over a 
region of at least 15, 20, 30, 100 more contiguous amino acids. The homology may be calculated on 
the basis of amino acid identity (sometimes referred to as "hard homology"). 

For example the UWGCG Package provides the BESTFIT program which can be used to 
calculate homology (for example used on its default settings) (Devereux et al (1984) Nucleic Acids 

15 Research 12, p387-395). The PILEUP and BLAST algorithms can be used to calculate homology or 
line up sequences (such as identifying equivalent or corresponding sequences (typically on their 
default settings), for example as described in Altschul S. F. (1993) J Mol Evol 36:290-300; Altschul, 
S, F et al (1990) J Mol Biol 215:403-10. 

Software for performing BLAST analyses is publicly available through the National Center 

20 for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). This algorithm involves first 

identifying high scoring sequence pair (HSPs) by identifying short words of length W in the query 
sequence that either match or satisfy some positive-valued threshold score T when aligned with a 
word of the same length in a database sequence. T is referred to as the neighbourhood word score 
threshold (Altschul et al, supra). These initial neighbourhood word hits act as seeds for initiating 

25 searches to find HSPs containing them. The word hits are extended in both directions along each 
sequence for as far as the cumulative alignment score can be increased. Extensions for the word hits 
in each direction are halted when: the cumulative alignment score falls off by the quantity X from its 
maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one 
or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST 

30 algorithm parameters W, T and X determine the sensitivity and speed of the alignment. The BLAST 
program uses as defaults a word length (W) of 1 1, the BLOSUM62 scoring matrix (see Henikoff and 
Henikoff (1992) Proc. Natl Acad. Sci. USA 89: 10915-10919) alignments (B) of 50, expectation (E) 
of 1 0, M=5, N=4, and a comparison of both strands. 

The BLAST algorithm performs a statistical analysis of the similarity between two 

35 sequences; see e.g., Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90: 5873-5787. One 

measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which 
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provides an indication of the probability by which a match between two amino acid sequences would 
occur by chance. For example, a sequence is considered similar to another sequence if the smallest 
sum probability in comparison of the first sequence to the second sequence is less than about 1, 
preferably less than about 0.1, more preferably less than about 0.01, and most preferably less than 
5 about 0.001. 

The homologous sequence typically differ by at least 2, 5, 10, 20 or more mutations (which 
may be substitutions, deletions or insertions of amino acids). The homologous sequence typically 
differ by at most 5, 10, 20 or more mutations (which may be substitutions, deletions or insertions of 
amino acids). Typically, up to 40% of the amino acids of the sequence are mutated. These mutation 
10 may be measured across any of the regions mentioned above in relation to calculating homology. The 
substitutions are preferably conservative substitutions. These are defined according to the following 
Table. Amino acids in the same block in the second column and preferably in the same line in the 
third column may be substituted for each other: 
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Claims 

1 . A method of performing electron microscopy of a molecular entity, comprising: ' 
providing a protein layer having a structure which repeats regularly in two dimensions and 

which supports molecular entities each attached at a predetermined position in the repeating structure 
5 of the protein layer; and 

performing electron microscopy of the protein layer having the molecular entities supported 
thereon to derive an image. 

2. A method according to claim 1 , wherein the protein layer is a protein layer according to any 
one of claims 11 to 29. 

10 3. A method according to claim 1 or 2, wherein said step of providing a protein layer which 
supports molecular entities comprises making the protein layer and subsequently attaching the 
molecular entities thereto. 

4. A method according to claim 3, wherein the step of attaching the molecular entities to the 
protein layer comprises is performed in solution. 

15 5. A method according to any one of claims 1 to 3, further comprising, prior to the step of 
performing electron microscopy, aligning the molecular entities with respect to the protein lattice. 

6. A method according to claim 5, wherein the step of aligning the molecular entities with 
respect to the protein lattice comprises applying an electric field to the protein lattice. 

7. A method according to claim 6, wherein the step of aligning the molecular entities with 
20 respect to the protein lattice comprises cooling the protein lattice to a minimum energy state. 

8. A method according to any one of claims 1 to 7, further comprising performing data analysis 
of the image. 

9. A method according to claim 8, wherein the data analysis is a two-dimensional 
crystallographic data analysis. 

25 10. A method according to claim 8, wherein the data analysis comprises identifying the 
components of the protein lattice and subtracting them from the image derived in sais step of 
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performing electron microscopy to derive an image of the molecular entities, and performing a single 
particle reconstruction of the image of the molecular entities. 

11. A protein layer which repeats regularly in two dimensions, 

the protein layer comprising protein protomers which each comprise at least two monomers 
5 genetically fused together, the monomers each being monomers of a respective oligomer assembly, 
the protomers comprising: 

a first monomer which is a monomer of a first oligomer assembly belonging to a dihedral 
point group of order O, where O equals 3, 4 or 6, and having a set of O rotational symmetry axes of 
order 2 extending in two dimensions; and 
10 a second monomer genetically fused to said first monomer which second monomer is a 

monomer of a second oligomer assembly having a rotational symmetry axis of order 2, 

the first monomers of the protomers are assembled into said first oligomer assemblies and the 
second monomers of the protomers are assembled into said second oligomer assemblies, said 
rotational symmetry axis of said second oligomer assemblies of order 2 being aligned with one of 
1 5 said set of rotational symmetry axes of order 2 of one of said first oligomer assemblies with two 
protomers being arranged symmetrically therearound. 

12. ' A protein layer according to claim 1 1, wherein the second oligomer assembly belongs to a 
dihedral point group of order 2 or to a cyclic point group of order 2. 

13. A protein layer according to claim 1 1 or 12, wherein the protomers are homologous with 
20 respect to the monomers. 

14. A protein layer according to claim 13, wherein said second oligomer assembly belongs to a 
dihedral point group of order 2 

15. A protein layer according to claim 13, wherein the second oligomer assembly is a 
hetrologous oligomer assembly of said second monomers and of third monomers, said protein layer 

25 further comprising said third monomers assembled with said second monomers into said second 
oligomer assembly. 

1 6. A protein layer according to claim 15, wherein the third monomers are monomers which have 
a binding site capable of binding to biotin or a peptide, and said second monomers are aptamers 
which are capable of binding to said binding site. 
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17. A protein layer according to claim 16, wherein said third monomers are streptavidin. 



1 8. A protein layer according to claim 16 or 17, wherein said second monomers are Streptag I 
(SEQ ID NO. 3). 

19. A protein layer according to claim 1 1 or 12, wherein the protomers are heterologous with 
5 respect to the monomers. 

20. A protein layer according to claim 1 9, wherein the protein layer comprises protein protomers 
of two types, 

the first type of protomer comprising a first monomer which is a monomer of said first 
oligomer assembly belonging to a dihedral point group of order O, where O equals 3, 4, or 6, 
1 0 genetically fused to a second monomer which is a monomer of said second oligmer assembly, said 
second oligomer assembly being a hetrologous oligomer assembly belonging to a cyclic point group 
of order 2, and 

the second type of protomer comprising a third monomer which is a monomer of said second 
oligomer assembly, genetically fused to a fourth monomer which is a monomer of a third oligomer 
1 5 assembly, said third oligomer assembly belonging to a dihedral point group of order 2 or O. 

21. A protein layer according to claim 20, wherein said third oligomer assembly belongs to a 
dihedral point group of order O, said third oligomer assembly being the same as said first oligomer 
assembly. 

22. A protein layer according to any one of claims 1 1 to 21, wherein each of said monomers of 
20 said respective oligomer assemblies either is a naturally occurring protein or is based on a naturally 

occurring protein with peptide elements being absent from, substituted in, or added to the naturally 
occurring protein without substantially affecting assembly of monomers of said respective oligomer 
assembly. 

23 . A protein layer according to any one of claims 1 1 to 22, wherein, in said protomers, said 
25 monomers are genetically fused via a linking group. 

24. A protein layer according to claim 23, wherein the linking group is oriented relative to the 
first and second monomers in the protomer in its normal form prior to assembly to reduce any 
difference in the assembled layer in either or both of the position and orientation of (a) the termini of 
said first monomers in their arrangement in said first oligomer assembly in its natural form 
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symmetrically around said one of said set of rotational symmetry axes of order N of said first 
oligomer assembly, and (b) the termini of said second monomers in their arrangement in said second 
oligomer assembly in its natural form symmetrically around said rotational symmetry axis of order N 
of said second oligomer assembly. 

5 25. A protein layer according to any one of claims 1 1 to 25 and having an array of molecular 
entities attached thereto. 

26. A protein layer according to claim 25, wherein a component of the protein layer has an 
affinity tag, the molecular entities being attached to respective affinity tags. 

27. A protein layer according to claim 25, wherein the molecular entity comprises a protein 
10 having a peptide affinity tag attached to a component of the protein layer. 

28. A protein layer according to claim 25, wherein the molecular entity comprises a protein, and 
both of a component of the protein layer and the molecular entity have respective affinity tags 
attached to each other. 

29. A protein layer according to claim 25, wherein the molecular entities are genetically fused 
15 within a component of the protein layer. 

30. A protein protomer comprising at least two monomers genetically fused together, the 
monomers each being monomers of a respective oligomer assembly into which the monomers are 
capable of self-assembly to assemble a protein layer which repeats regularly in two dimensions, 
wherein said protomer comprises: 

20 a first monomer which is a monomer of a first oligomer assembly belonging to a dihedral 

point group of order O, where O equals 3, 4 or 6, and having a set of O rotational symmetry axes of 
order 2 extending in two dimensions; and 

a second monomer genetically fused to said first monomer which second monomer is a 
monomer of a second oligomer assembly having a rotational symmetry axis of order 2 

25 3 1 . A polynucleotide encoding a protein protomer according to claim 30. 

32. A vector capable of expressing a protein protomer according to claim 30. 



33. A host cell comprising a vector according to claim 32. 
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34. A method of performing x-ray crystallography, comprising: 

supporting an array of molecular entities on a protein layer according to any one of claims 25 
to 29; and 

performing x-ray crystallography or electron microscopy on the protein layer having the 
5 molecular entities supported thereon to derive an image. 
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